Mining Alzheimer’s disease clinical data: reducing effects of natural aging for predicting progression and identifying subtypes

Introduction Because Alzheimer’s disease (AD) has significant heterogeneity in encephalatrophy and clinical manifestations, AD research faces two critical challenges: eliminating the impact of natural aging and extracting valuable clinical data for patients with AD. Methods This study attempted to address these challenges by developing a novel machine-learning model called tensorized contrastive principal component analysis (T-cPCA). The objectives of this study were to predict AD progression and identify clinical subtypes while minimizing the influence of natural aging. Results We leveraged a clinical variable space of 872 features, including almost all AD clinical examinations, which is the most comprehensive AD feature description in current research. T-cPCA yielded the highest accuracy in predicting AD progression by effectively minimizing the confounding effects of natural aging. Discussion The representative features and pathogenic circuits of the four primary AD clinical subtypes were discovered. Confirmed by clinical doctors in Tangdu Hospital, the plaques (18F-AV45) distribution of typical patients in the four clinical subtypes are consistent with representative brain regions found in four AD subtypes, which further offers novel insights into the underlying mechanisms of AD pathogenesis.


Introduction
Aging and age-related chronic diseases in the elderly, including age-associated memory impairment, mild cognitive impairment (MCI), and Alzheimer's disease (AD), belong to the aging syndrome category (Fried, 2012).It is traditionally understood that AD is the result of a transformation from quantitative to qualitative changes within the normal aging process, indicating a close correlation between AD and normal aging in terms of clinical symptoms Han et al. 10.3389/fnins.2024.1388391Frontiers in Neuroscience 02 frontiersin.org(Petersen, 2000).For instance, as an intermediate stage between normal aging and dementia, MCI is easily mistaken for natural aging because the pathological changes presented by patients in the early stages are not obvious, which makes it difficult to achieve precise treatment in the early stages of the disease (Platero and Tobar, 2020).Moreover, aging of the human body may lead to memory loss and a decline in brain function, which are considered essential factors that affect AD diagnosis (Tranah et al., 2012).Consequently, accurately distinguishing the specific symptoms of AD from those associated with natural aging in the early stages is essential to enhance the effectiveness of AD diagnosis and improve the cure rate.
Various biomarkers and clinical symptoms have been employed in AD correlational research, such as evaluating AD progression and identifying AD subtypes based on neuroimaging and biological detection technology.The pathophysiological changes in AD mainly include amyloid deposition, neurofibrillary tangles, and neurodegeneration (Yang et al., 2022).Nonetheless, existing biomarkers lack a longitudinal perspective and simultaneously combine the effects of natural aging owing to the complex neurodegenerative pathogenesis (Franzmeier et al., 2020).Additionally, there is no consensus on the most effective biomarker for early diagnosis because each biomarker differs in terms of sensitivity, specificity, and reliability.For instance, almost one-third of clinically diagnosed AD patients do not have an accumulation of Αβ in specific brain regions, such as the hippocampus and frontal lobe, and many people who had Αβ accumulation after death did not show cognitive impairment during their lifetime (Sekiya et al., 2018).Therefore, the extraction of more useful features with minimal interference from natural aging for subsequent AD analyses is a major challenge.
Research on AD has focused on featuring engineering based on extensive diagnostic data, driven by the high heterogeneity of disease progression among patients.In the early 21st century, significant advancements in imaging diagnosis technology and AD pathology research have led to the emergence of many AD progression prediction methods based on neuroimaging (Sarica et al., 2017), which focus on dimension reduction (Duchesne et al., 2009;Zhu et al., 2013) and feature selection techniques (Li et al., 2013(Li et al., , 2015)).With the improvement of imaging diagnosis technology and the development of AD pathology research, a large number of AD process prediction methods based on Neuroimaging appeared at the beginning of the 21st century (Zhu et al., 2014;Hyun et al., 2016).Chen et al. presented a model named Low-rank Sparse Feature Selection with Incomplete Labels (LSFSIL) for predicting cognitive performance and identifying informative neuroimaging markers with MRI data (Chen et al., 2022).Lu et al. proposed a novel method to learn an enriched representation for imaging biomarkers (Lu et al., 2021).Jiang et al. proposed a novel multi-task learning formulation, which considers a correlation-aware sparse and low-rank constrained regularization, for accurately predicting the cognitive scores and identifying the most predictive biomarkers (Jiang et al., 2018).Since deep learning algorithm can mine the potential features in image data, the method of image analysis using deep learning has become the main research direction of this problem (Lin et al., 2018;Jo et al., 2019;Abrol et al., 2020).Liu et al. proposed an ensemble learning framework based on artificial neural networks to create effective models for AD/MCI prediction from multiple modalities of neuroimaging and multiple baseline estimators (Liu et al., 2016).Hojjati et al. utilized unimodal/ bimodal neuroimaging measures and a non-linear regression method (based on artificial neural networks) to predict the neuropsychological scores (Hojjati et al., 2022).Hoang et al. proposed Vision Transformers (ViT) to make an MCI-to-AD prediction based on structural magnetic resonance images (Hoang et al., 2023).However, subsequent studies have revealed that AD is a heterogeneous disease influenced by diverse pathophysiological mechanisms beyond conventional understanding (Neff et al., 2021).Neuroimaging biomarkers only represent a portion of clinical manifestations of AD.Other critical indicators, such as cognitive evaluations, remain underutilized in subsequent analyses, thus failing to fully capture the development of AD.
As a longitudinal multicenter study aimed at developing clinical, imaging, genetic, and biochemical biomarkers for the early detection and tracking of AD progression, the Alzheimer's Disease Neuroimaging Initiative (ADNI) provides comprehensive clinical diagnostic data that offer a holistic view of AD across multiple domains (Petersen et al., 2010).The research on AD based on time series has received attention from the academic community recently.Liang et al. proposed a multitask learning framework that can adaptively impute missing values and predict AD progression over time from a subject's historical measurements including MRI volumetric measurements, trajectories of a cognitive score and clinical status (Liang et al., 2021).Ho et al. proposed a bidirectional progressive recurrent network with imputation (BiPro) that uses longitudinal data to forecast clinical diagnoses and phenotypic measurements (Ho et al., 2022).El-Sappagh et al. proposed a novel two-stage deep learning AD progression detection framework based on information fusion of several patient longitudinal multivariate modalities (El-Sappagh et al., 2022).However, many prediction methods using ADNI datasets solely focus on the feature dimension of the AD data, disregarding the temporal dimension.This oversight leads to inaccurate predictions because crucial changes in the clinical features of temporal trends are missing.Moreover, during the aging process of the healthy elderly, various physical functions and conscious minds fluctuate or degrade relatively, interfering with the detection of various biomarkers (Ezzati et al., 2019).Therefore, it is challenging to predict AD progression based on time and feature dimensions while simultaneously alleviating the interference caused by natural aging to reliably differentiate between normal cognitive aging, MCI, and AD.
An effective algorithm is essential for analyzing AD clinical data.Principal component analysis (PCA) is a classical algorithm that maps the data points in high-dimensional space into lower-dimensional space to extract the main components of features and reveal their main characteristics, and is widely used in machine learning and data mining (He et al., 2021).Because of its ability to retain useful information and remove redundant information as much as possible from highdimensional data, PCA is well-suited for AD data analysis compared to other analytical methods because AD data have high dimensionality and a small number of samples.Thus, based on PCA, a novel machine learning approach called Tensorized contrastive PCA (T-cPCA) was proposed in this study to develop an AD longitudinal clinical data representation for AD progression prediction and AD subtype identification, with the advantage of eliminating the effects of natural aging.T-cPCA can eliminate the effects of natural aging to capture low-dimensional structures enriched in the target dataset relative to the background data, providing a more accurate analysis of AD data.Based on the ADNI dataset, T-cPCA was applied to obtain the fusion features of the time and feature dimensions and was further used for AD progression prediction.Moreover, as a concept focusing on the characteristics of the typical pathological changes in AD combined with multiple groups of biomarkers (Ferreira et al., 2021), AD subtypes have a wide range of value-in-use and prospects for clinical application.To overcome the current limitations of horizontal AD subtypes and In this context, this study proposed a novel multidimensional time series representation method termed as Tensorized contrastive Principal Component Analysis (T-cPCA) for predicting AD progression and identifying AD clinical subtypes.In AD progression prediction with stratified three-fold cross validation, T-cPCA delivers the highest ACC comparing with 6 typical PCA extension methods.Ablation experiments indicated the effectiveness of fusion features for AD progression prediction.Moreover, the identified AD clinical subtypes can be further used to improve the prediction accuracy, which incarnates that the discovery of AD clinical subtypes is a critical step toward precision medicine for this devastating disease.In addition to the effectiveness in saliency features and pathogenic circuits, the clinical manifestations and targeted treatment of the AD subtypes are discovered for AD pathophysiological mechanism research, which brings new insights for understanding the mechanisms underlying the pathogenesis of AD and paves the way for the early diagnosis.

Overview of T-cPCA
As an extension of contrastive PCA (cPCA), which aims to determine contrastive principal components (cPCs) that maximize the variance in the target dataset and minimize the variance in the background dataset (Yu et al., 2020;Yu and Liu, 2020), T-cPCA adopts multidimensional clinical data as tensors first.We included patients who had never suffered from AD in the cognitively normal (CN) cohort and those who had dementia in the dementia cohort.Considering the feature and time dimensions, we conducted a comparative PCA on these two dimensions to determine the cPCs on each dimension.Finally, AD clinical representation was obtained by integrating features from two dimensions for AD progression prediction and clinical subtype identification (Figure 1).
The principle of T-cPCA is as follows (Equations 1-7).Suppose we have a multidimensional time series dataset = { } = . Here S i can be written as tensor, the dimension of the horizontal axis represents various medical testing features (MRI, PET, etc.), and the dimension of the vertical axis represents the time span (T 1 , T 2 ,…, T P ), which number of features is k , as well as span of time is p.Firstly, we partition the datasets, and then get the target data S S We aim to find: We aim to find: And then we get matrix M time , which is composed of eigenvectors corresponding to the top m time cPCs, where time m p ≤ .The final mapping data can be converted into: The details of T-cPCA is provided in Supplement material A.

Participants and measurements
The ADNI database was initiated in 2003 by several national institutes, including the Food and Drug Administration, private pharmaceutical companies, and nonprofit organizations in the United States.This longitudinal multicenter study aimed to develop clinical, imaging, genetic, and biochemical biomarkers for the early detection and tracking of AD.The ADNI dataset is updated regularly, and the longitudinal clinical data of 1,631 participants had been collected until 2022.The TADPOLE challenge provides a standardized dataset exported by the ADNI with the richest clinical features for AD research.In the experiments, we utilized the dataset provided by TADPOLE, which includes the following: (1) amyloid and CSF biomarkers of tau deposition; (2) various biomarker analysis methods, positron emission tomography with several different tracers: fluorodeoxyglucose (FDG), AV45 (amyloid), AV1451 (Tau proteins), and DTI; (3) cognitive evaluation performed in the presence of clinical experts; (4) genetic information extracted from DNA, such as APOE4 expression level; (5) demographic information, including age, gender, education level, race, and marital status.In this study, the dataset provided by the TADPOLE challenge included the clinical diagnostic data of 1,631 participants which are divided into CN, MCI, and AD (Table 1).
The longitudinal nature of the ADNI dataset was evident in its continuous collection of clinical diagnostic data for each participant unless they passed away or withdrew voluntarily for other reasons.These long-term data provide a robust foundation for further research on AD.

Model parameter selection
In T-cPCA, increasing the values of hyperparameters α feature and α time , which denote the trade-off between the dementia and CN cohorts, can improve the elimination of natural aging effects.Considering the complexity of the T-cPCA algorithm, we developed an intelligent evolutionary algorithm to rapidly and effectively determine the optimal solution for the hyperparameters.With the advantages of strong global and local searching capabilities, the firework algorithm is a swarm intelligence algorithm widely used for solving optimization problems in various domains, including image recognition and spam detection (Zheng et al., 2015).The information is exchanged among fireworks and is characterized by the number of explosion sparks and explosion radius, improving the suitability of the firework algorithm for the high-dimensional small-sample characteristics of AD data.Therefore, we propose an improved extension of the traditional firework algorithm specifically tailored for AD data to enhance its computational efficiency.For details of the algorithm, please refer to Supplementary material A.

AD progression prediction
Following the principle of preserving comprehensive information from feature and time dimensions, we initially aligned the data in the time dimension.Subsequently, missing values in the diagnostic data were supplemented using similar diagnostic results.We used the data of patients who had never suffered from AD as background data, the data of patients who had suffered from AD as target data, and the change in future disease development as the target data label, denoted better (1), worse (−1), and unchanged (0).The prediction time spans were selected as 1, 3, and 5 years, respectively (Table 2).Modeling framework and overall strategy.The comparative experiments were implemented among T-cPCA and original multidimensional time series (OR), and the calculated statistical features from the time series including mean value, standard deviation, maximum, minimum, variance, skewness and kurtosis (SOR), and six typical PCA representation methods, comprising the two-dimensional PCA (2DPCA), kernal PCA with radial basis function kernel (PCA1), kernal PCA with rational quadratic kernel (PCA2), kernal PCA with linear kernel (PCA3), kernal PCA with polynomial kernel (PCA4), and kernal PCA with sigmoid kernel (PCA5), and two ablation models (T-cPCA only with feature representation and T-cPCA only with time representation) with three popular supervised machinelearning algorithms (Multilayer Perceptron, Random Forest, and K-Nearest Neighbors algorithm) to verify the effectiveness of the proposed method.The technical details about these machine learning algorithms are shown in A.5 section in the Supplementary materials.The prediction tasks were 1-, 3-, and 5-year AD progression predictions with a stratified three-fold cross-validation.The evaluation indices of the models were the accuracy, recall value, and F1 score.
In contrast to the six typical PCA representation methods, the representation obtained by T-cPCA had the highest classification accuracy and delivered the highest specificity for progression prediction using three popular supervised machine-learning algorithms (Table 3).And the prediction results in Table 3 underwent  .and T-statistic value > 2, we reject H 0 and accept H 1 , which means that the prediction performance of T-cPCA is significantly superior to that of comparison models.
Table 4 is the standard deviations of predictions Table 3, which are obtained from ten runs of all models.
With regard to the prediction results of different time spans, the AD clinical representation by T-cPCA showed the highest accuracy in the tasks of 1-, 3-, and 5-year prediction, suggesting that T-cPCA could capture the long-term change characteristics of AD development.Moreover, MLP is more effective in predicting AD progression based on T-cPCA.We infer that T-cPCA, as an extension of PCA series algorithms, can extract effective features that integrate the fusion information of the time and feature dimensions, thus providing novel insights for analyzing the internal trend of AD data.
To further demonstrate the effectiveness of T-cPCA, four typical deep neural network structures including convolutional neural network (CNN), Long Short Term Memory network (LSTM), gated recurrent neural network (GRU) and bidirectional LSTM (BiLSTM) are applied for AD progression prediction based on T-cPCA and original multidimensional time series (Table 5).The deep neural network for comparison is constructed as two layer typical network structure and four layer fully connected network structure.The experimental results indicate that MLP based on T-cPCA delivers the highest ACC among the other prediction models especially deep learning models, which means that deep learning is not as effective as MLP in predicting AD progression.

AD clinical subtypes identification
Many genetic, metabolic, and clinical studies have provided evidence for the existence of distinct AD subtypes.The identification of these subtypes helps improve AD biomarker identification, targeted pathological research, correct patient diagnosis, and efficient drug development.In this section, a clustering algorithm called hierarchical clustering is applied to the clinical representation obtained by T-cPCA to identify AD clinical subtypes.By maximizing the clustering effectiveness evaluation index called the silhouette score, we identified four clinical AD subtypes.For each subtype, three machine-learning algorithms with the same hyperparameters were used to verify the effectiveness of clustering results.The details are shown in Supplementary material B.
After clustering, compared with the prediction results before clustering, the classification results obtained by applying classifiers with the same parameters in each cluster showed considerable improvement, particularly for the key evaluation index (ACC), among which the one-year prediction task showed an almost 10% improvement in the ACC (Table 6).On the one hand, it demonstrated the effectiveness of the four identified AD clinical subtypes.On the other hand, the AD progression prediction performance can be further improved by training classifiers in different clinical subtypes.For the five-year prediction task with the KNN classifier, the prediction performance of the model decreased after clustering, owing to the small amount of data.Overall, the experimental results show that the clinical representation extracted by T-cPCA is significant and can provide a solid foundation for future research.

Horizontal characterization of AD clinical subtypes
The clinical features of the different AD subtypes may hold promise for the early diagnosis of AD.Therefore, the Gini index was used to calculate the importance of the features in each subtype to , , , ⊃ , The Gini index of node q of the ith tree is calculated as (Equations 8-11): Here C indicates that there are C categories, p qc denotes the proportion of category c in node q.The importance of x j in node q of the i th tree is the change of Gini index before and after branching.The calculation process is as following: Means that the prediction performance of the comparison models is significantly worse than that of T-cPCA (excluding ablation models).◊ means that the prediction performance of T-cPCA is significantly better than that of more than half of the comparison models (excluding ablation models).The bold values are the best results in each row.

VIM G I GI GI
Among them, GI l and GI r respectively denote the Gini index of two new nodes after branching.If the variable x j appears m times in the i th tree, then the importance of feature x j in the i th tree is: Above all, the Gini importance of the j th feature in RF is defined as: Among them, N is the number of trees.Finally, we normalize all Gini importance scores.
We selected features with a Gini index higher than 0 02 .as the representative features of the different subtypes.The salient brain regions affected by the four clinical AD subtypes are shown in Figure 2. The details of the top 10 important features of four subtypes are provided in Supplementary material B.
In the first clinical AD subtype, the salient-affected brain regions were located in the corpus callosum, right cuneus, left inferior temporal gyrus, left superior frontal gyrus, left transverse temporal gyrus, left middle temporal gyrus, left superior frontal gyrus, and left hippocampus (Figure 2A).Alterations in the corpus callosum emerged as an early manifestation of this AD subtype progression.As the largest white matter structure, the corpus callosum receives blood from several major arterial systems and plays a critical role in the onset of AD (Das et al., 2021).Accumulation of amyloid-beta peptide (Αβ ) is related to callosal myelination, leading to an imbalance in glial cells, an increased presence of phagocytic microglia and reactive astrocytes, and reduced numbers of oligodendrocyte progenitor cells (Aires et al., 2022).The mean diffusivity of the corpus callosum measured using DTI showed a significant decrease in fractional anisotropy among patients with AD (Xiao et al., 2022).MRI has revealed that atrophy of the posterior corpus callosum is positively associated with apathy in patients with AD (Yu et al., 2020).Additionally, the compromised white matter microstructure in the posterior section of the corpus callosum is associated with poorer semantic fluency (Sánchez et al., 2020).Furthermore, combination therapy by Donepezil and Rivastigmine has demonstrated significant improvements in the size of the corpus callosum in patients with severe Alzheimer's disease (Khasawneh et al., 2022).
In the second clinical AD subtype, prominent changes were observed in specific brain regions, including the left anterior cingulate gyrus, left supramarginal gyrus, right precentral gyrus, and right precentral gyrus (Figure 2B).The affected areas primarily involve the cingulate gyrus and supramarginal gyrus, which are crucial hubs for information processing and regulation within the brain (Yuan et al., 2022).Research has demonstrated that a higher tau signal (CSF Αβ 42/40 ratio) and reduced gray matter density in the posterior cingulate cortex and angular gyrus are associated with decreased parietal functional connectivity in individual patients (Berron et al., 2020), leading to memory decline (Yasar et al., 2020).Blood oxygenation level-dependent signals measured using resting-state functional magnetic resonance imaging have been associated with AD development (Zheng et al., 2020).In patients with AD, cerebral blood flow (CBF) decreases from the early stages as processes preceding and following the onset of cerebrovascular risk factors, or stroke may trigger amyloid-beta deposition in the precuneus/ posterior cingulate cortex.The epsilon 4 allele of the apolipoprotein E (APOE) gene may accelerate age-related cortical thickening and reduction in CBF in the anterior cingulate cortex (Hays et al., 2020).In addition, the posterior cingulate gyrus is particularly activated during the recollection of personal events and inference of others' mental states, and dysfunction in this area contributes to cognitive  decline in tasks involving verbal information storage, drawing abilities, and nonverbal abstract reasoning among individuals with AD (Takenoshita et al., 2020).Significant correlations have been observed between the functional connectivity of the anterior cingulate cortex and episodic memory dysfunction and executive function impairments.Evidence suggests that angiotensin II type-1 receptor blockers may protect against memory decline by reducing the rates of amyloid-beta accumulation in this AD subtype (Ouk et al., 2021).
For the third clinical AD subtype, the salient-affected brain regions were located in left temporal lobe, left supramarginal gyrus, right occipital lobe, right superior temporal gyrus, left anterior cingulate gyrus, and right paracentral lobule (Figure 2C).The affected brain regions were primarily located in the temporal lobe.Seizures that occur early in the course of AD are likely to originate from the mesial temporal lobe, which is one of the first structures affected by Alzheimer's pathology and one of the most epileptogenic regions in the brain.Genetic mutations associated with AD increase the tau levels, and the accumulation of tau linearly increases neuronal hyperexcitability, leading to seizures (Zawar and Kapur, 2023).The presence of baseline CSF Ptau is related to the loss of structural stability in connectivity within the medial temporal lobe (Chen et al., 2020).During the early stages of MCI, hyperconnectivity within the ventral medial temporal lobe structures and hypoconnectivity between the dorsal medial temporal lobe regions and the anterior/ posterior midline default-mode network nodes are crucial biomarkers for early AD diagnosis, which can further progress to cortical atrophy in the occipital temporal lobe (Sintini et al., 2020).The clinical symptoms include temporal lobe epilepsy, situational amnesia, and worse executive functioning, language, and attention (Visser et al., 2020).The entrainment of neural oscillations in the occipital cortices through external rhythmic visual stimuli shows promise as a novel therapy for AD patients with this subtype (Wiesman et al., 2021).Rapamycin, an immune system inhibitor and a longevity drug, may be a potential treatment for this AD subtype by rescuing proteins in the temporal lobe (Wang et al., 2019).
For the fourth clinical AD subtype, the salient-affected brain regions were located in left entorhinal cortex, left middle temporal gyrus, right cingulate gyrus, right transverse temporal gyrus, right bankssts, left parietal lobe, and right middle temporal gyrus (Figure 2D).As one of the earliest sites showing pathological changes, the entorhinal cortex plays a critical role in the development of this AD subtype.Aging of the entorhinal cortex is associated with increased expression levels of APP genes and MAPT genes, resulting in significant accumulation of β-amyloid (Aβ) and neurofibrillary tangles during the amnestic MCI phase of AD (Li   Lopez et al., 2022).The grid-cell network of the entorhinal cortex, which is considered one of the earliest neurodegenerative regions, is crucial for path integration in humans and rodents (Segen et al., 2022).The anterolateral entorhinal cortex plays a significant role in memory retention, and differences in its volume are associated with the performance on neuropsychological tests for AD (Yeung et al., 2021).Deep brain stimulation has shown promise in improving the cognitive function and has prompted clinical trials for the early treatment of AD (Yu et al., 2019).Chemical-protein interaction analysis has revealed that valproic acid is a potential therapeutic agent that can prevent AD progression in this subtype (Bottero et al., 2021).
We have demonstrated that the unique characteristics of the four AD clinical subtypes can effectively reveal multiple mechanisms and heterogeneous clinical manifestations (Table 7).This explains why AD is a syndrome with multiple coexisting mechanisms, providing a favorable basis for further research on multitarget drug interventions in clinical practice.On the one hand, our data validated the effectiveness of the proposed method.On the other hand, our method provides an important basis for early diagnosis and appropriate treatment of AD.

Longitudinal characterization of AD clinical subtypes
Discoveries of temporal changes and patterns exhibited by representative features within each AD clinical subtype may provide novel insights into disease pathogenesis.By examining the time dimension, the pathogenic circuits specific to each subtype were identified, providing valuable insights for precision medicine.We employed the Gini index to evaluate the importance of features for the samples in the four subtypes at different time points.The detailed calculations and results of the Gini index can be found in Supplementary material B.
Distinct change patterns of representative features in the time dimension were observed in the four clinical AD subtypes (Figure 3).In the first subtype, AD primarily affects the corpus callosum and right cuneus, followed by the right inferior temporal gyrus, left transverse temporal gyri, and left superior frontal gyrus.Ultimately, it affects the left hippocampus, a critical brain region associated with the onset of AD.In the second subtype, AD initially acts on the left anterior cingulate gyrus, primarily progressing to the left supramarginal gyrus.This eventually affects the right precentral gyrus and right postcentral gyrus, potentially leading to a final stage of general paralysis.In the third subtype, AD initially affects the left temporal lobe, followed by significant involvement of the left supramarginal gyrus and the right occipital lobe.Subsequently, it affects the right superior temporal gyrus, right paracentral lobule, left anterior cingulate gyrus, and right occipital lobe.In the fourth subtype, AD first affects the left entorhinal cortex, followed by the left middle temporal gyrus and right cingulate.Finally, it affects the right transverse temporal gyri, right banks, left parietal cortex, and right middle temporal gyrus.Notably, the pathogenic circuits of the four clinical AD subtypes exhibit distinct patterns.The corpus callosum, cingulate gyrus, temporal lobe, and entorhinal cortex serve as the initial sites for the four subtypes, with subsequent overall brain atrophy occurring over time.These different pathogenic circuits result in diverse clinical manifestations influenced by each patient's unique physical condition.The discovery of these pathogenic circuits will help clarify the mechanisms underlying the development of AD.
The National Institute of Aging-Alzheimer's Association (NIA-AA) proposed a biological classification standard for AD according to the ATN classification system, where A denotes Αβ , T denotes tau protein, and N denotes neurodegeneration (Jack et al., 2018).Although ATN biomarkers provide insights into the earlystage neuropathological processes of AD, they do not rely on clinical diagnostic or phenotypic data, and thus, only reflect the pathophysiological changes of the disease.Despite the widespread use of ATN biomarkers for early detection of AD, they have limitations in explaining the heterogeneity of individual clinical manifestations and predicting the degree of cognitive decline or   This is a longitudinal study based on the principle of preserving the richness of data on feature and time dimensions, the analyses lack available data.We are collecting clinical data to extend the existing longitudinal AD dataset together with the Tangdu Hospital.For instance, to predict five-year AD progression, we collected clinical data from participants who have been in continuous research for approximately 10 years, which was challenging.The expanded dataset will provide data support for further research on AD in the future.).The PET images on the right show the distribution of plaques in the left entorhinal of four typical patients in the forth subtype (18F-AV45).The line chart where the red line is located denotes the trend of representative feature in the related subtype, whereas line charts where the blue lines are located denote the trend comparison of the representative feature in the other subtypes.

FIGURE 1
FIGURE 1 paired sample t-tests to demonstrate the significance of T-cPCA.We proposed the Null Hypothesis as H x d 0 0 : ″ (The average difference between prediction performance of T-cPCA and the comparison models in three evaluation indices ″ 0), while the Alternative Hypothesis is H x d 1 0 : > (The average difference between prediction performance of T-cPCA and the comparison models in three evaluation indices > 0), and our significance level is α = 0 01 . .As a result.we compute theT-statistic value and obtain the p-value.If p-value ″ 0 01

FIGURE 2
FIGURE 2 Salient-affected brain regions for four clinical subtypes.Salient-affected brain regions in (A) first, (B) second, (C) third, and (D) fourth clinical AD subtypes.For each figure, the first row from left to right are the lateral views of the left hemisphere, topside, and lateral view of the right hemisphere.The second row from left to right are medial views of the left hemisphere, bottom side, and medial view of the right hemisphere.The third row shows the frontal side and backside.
FIGURE 3Longitudinal change in the salient-affected brain regions for four clinical subtypes.Each row shows the key brain regions affected by AD in different subtypes over time.

FIGURE 4
FIGURE 4 Trend analysis of representative features of four clinical subtypes.(A) The line charts on the left are the comparison of change in volume of corpus callosum, which is regarded as the most representative longitudinal feature in the first subtype with p-value test ( 0.01 p  ).The PET images on the right show the distribution of plaques in the corpus callosum of four typical patients in the first subtype (18F-AV45).(B) The line charts on the left are the comparison of change in cortical thickness standard deviation of left anterior cingulate, which is regarded as the most representative longitudinal feature in the second subtype with p-value test ( 0.01 p  ).The PET images on the right show the distribution of plaques in the left anterior cingulate of four typical patients in the second subtype (18F-AV45).(C) The line charts on the left are the comparison of change in volume of left temporal lobe, which is regarded as the most representative longitudinal feature in the third subtype with p-value test ( 0.01 p  ).The PET images on the right show the distribution of plaques in the left temporal lobe of four typical patients in the third subtype (18F-AV45).(D) The line charts on the left are the comparison of change in cortical thickness standard deviation of left entorhinal, which is regarded as the most representative longitudinal feature in the fourth subtype with p-value test ( 0.024 p =).The PET images on the right show the distribution of plaques in the left entorhinal of four typical patients in the forth subtype (18F-AV45).The line chart where the red line is located denotes the trend of representative feature in the related subtype, whereas line charts where the blue lines are located denote the trend comparison of the representative feature in the other subtypes.

TABLE 1
Baseline characteristics of participants in TADPOLE.

TABLE 2
Baseline characteristics of samples in experiments.
importance in the Random Forest (RF) algorithm, and it has been widely used in many fields.Suppose we use VIM j gini to denote the j th feature's Gini index, which demonstrates the average change of node splitting impurity of the j th feature in all RF decision trees.GI denotes the Gini index.Assuming that there are n features x x x n 1 2

TABLE 3
Performance of predictive models in the measurement of accuracy (ACC), recall value (recall), and F1 score (F1).

TABLE 4
The standard deviations of the model prediction performance in Table3.

TABLE 5
Performance of MLP and four typical deep neural networks based on T-cPCA and original multidimensional time series in the measurement of accuracy (ACC), recall value (recall), and F1 score (F1).
The bold values are the best results in each column.

TABLE 6
Comparison of performance in task of AD progression prediction before and after clustering.

TABLE 7
Difference among four clinical subtypes.